How to Choose a Champion 
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League competition is investigated using random processes and scaling techniques. In our model, 
a weak team can upset a strong team with a fixed probability. Teams play an equal number of 
head-to-head matches and the team with the largest number of wins is declared to be the champion. 
The total number of games needed for the best team to win the championship with high certainty, 
T, grows as the cube of the number of teams, A'^, i.e., T ~ A'^'^. This number can be substantially 
reduced using preliminary rounds where teams play a small number of games and subsequently, 
only the top teams advance to the next round. When there are k rounds, the total number of 
games needed for the best team to emerge as champion, Tk, scales as follows, ~ A'^^'' with 
= [1 - (2/3)'=+^]"\ For example, = 9/5,27/19,81/65 for k = 1,2,3. These results suggest 
an algorithm for how to infer the best team using a schedule that is linear in A^. We conclude that 
league format is an ineffective method of determining the best team, and that sequential elimination 
from the bottom up is fair and efficient. 

PACS numbers: 02.50.-r, 01.50.Rt, 05.40.-a, 89.75.Da 



I. INTRODUCTION 

Competition is ubiquitous in physical, biological, so- 
ciological, and economical processes. Examples include 
ordering kinetics where large domains grow at the ex- 
pense of small ones [11, , evolution where fitter species 
thrive at the expense of weaker species Q , social strati- 
fication where humans vie for social status [1, 0, @] i and 
the business world where companies compete for market 
share 0,11. 

The world of sports provides an ideal laboratory for 
modeling competition because game data are accurate, 
abundant, and accessible. Moreover, since sports compe- 
titions are typically head-to-head, sports can be viewed 
as an interacting particle system, enabling analogies 
with ph ysic al s yste ms that evolve via binary interac- 
tions jol. llOl [Til. Il2l|. For instance, sports nicely demon- 
strate that the outcome of a single competition is not 
predictable [l^ [13] ■ Over the past century the lower 
seeded team had an astounding 44% chance of defeating 
a higher seeded team in baseball The same is true 
for other competitions in arts, science, and politics. This 
inherent randomness has profound consequences. Even 
after a long series of competitions, the best team does 
not always finish first. 

To understand how randomness affects the outcome of 
multiple competitions, we study an idealized system. In 
our model league, there are N teams ranked from best 
to worst, so that in each match there is a well-defined 
favorite and underdog. We assume that the weaker team 
can defeat the stronger team with a fixed probability. Us- 
ing random walk properties and scaling techniques anal- 
ogous to those used in polymer physics [l^, , we study 
the rank of the champion as a function of the number of 
teams and the number of games. We find that a huge 
number games, T ~ N^, is needed to guarantee that the 
best team becomes the champion. 



We suggest that a more efficient strategy to decide 
champions is to set up preliminary rounds where a small 
number of games is played and based on the outcome 
of these games, only the top teams advance to the next 
round. In the final championship round, M teams play a 
sufficient number of games to decide the champion. 
Using k carefully constructed preliminary rounds, the re- 
quired number of games, Tk, can be reduced significantly 
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Remarkably, it is possible to approach the optimal limit 
of linear scaling using a large number of preliminary 
rounds. 



II. LEAGUE COMPETITION 

Our model league consists of N teams that compete in 
head-to-head matches. We assume that each team has 
an innate strength and that no two teams are equal. The 
teams are ranked from 1 (the best team) to N (the worst 
team). This ranking is fixed and does not evolve with 
time. The teams play a fixed number of head-to-head 
games, and each game produces a winner and a loser. In 
our model, the stronger (lower seed) team is considered 
to be the favorite and the weaker (higher seed) team is 
considered to be the underdog. The outcome of each 
match is stochastic: the underdog wins with the upset 
probability < q < 1/2 and the favorite wins with the 
complementary probability p = 1 — q. The team with the 
largest number of wins is the champion. 

Since the better team does not necessarily win a game, 
the best team does not necessarily win the championship. 
In this study, we address the following questions: How 
many games are needed for the best team to finish first? 
What is the typical rank of a champion decided by a 
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relatively small number of games? What is the optimal 
way to choose a champion? 

We answer these questions using scaling techniques. 
Consider the nth ranked team with 1 < n < A^. This 
team is inferior to a fraction ^— j of the — 1 remain- 



ing teams and superior to a fraction 



N-r 
N-1 



of the teams. 



Therefore, the probability P„ that this team wins a game 
against a randomly chosen opponent is a linear combina- 
tion of the probabilities p and q, 
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Using p = 1 — g, the probability P„ can be rewritten as 
follows 



Pn=p-{2p-l) 



n- 1 
N -1' 



(3) 



The latter varies linearly with rank: it is largest for the 
best team, Pi — p, and smallest for the worst team, 
PN^q. 

Now, suppose that the nth team plays t games, each 
against a randomly chosen opponent. The number of 
wins it accumulates, Wn{t), is a random quantity that 
grows as follows 



Wn{t+1) 




with probability P„ 
with probability 1 — P„. 



(4) 



The initial condition is u;„(0) — 0. The num- 
ber of wins performs a biased random walk and as 
a result, when the number of games is large, the 
quantity w„{t) is well-characterized by its average 
Wn{t) = {wn{t)) and its standard deviation cr„(t), de- 



fined via CT„(t) 



\{t)) — {wn{t)y. Here, the brackets 



denote averaging over infinitely many realizations of the 
random process. Since the outcome of a game is com- 
pletely independent of all other games, the average num- 
ber of wins and the variance in the number of wins are 
both proportional to the number of games played 
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Both of these quantities follow from the behavior af- 
ter one game: since w„(l) = 1 with probability 
Pn and w„(l) = with probability 1 — P„, then 
(w„(l)) = (w^(l)) Pn. Moreover, the distribution of 
the number of wins is binomial and for large t, it ap- 
proaches a Gaussian, fully characterized by the average 
and the standard deviation (l7| . 

The quantities Wn and (t„ can be used to understand 
key features of this system. Let us assume that each team 
plays t games against randomly selected opponents and 
compare the best team with the nth ranked team. Since 
Pi > Pn, the best team accumulates wins at a faster rate, 
and after playing sufficiently many games, the best team 
should be ahead. However, since there is a diffusive-like 



uncertainty in the number of wins, (7„ ~ \/t, it is possible 
that the nth ranked team has more wins when t is small. 
The number of wins of the nth team is comparable with 
that of the best team as long as Wi{t) — Wn{t) oc ai{t), 
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(2p-l)^^<«\/t. 



Since the diffusion coefficient £)„ = P„(l — P„) in (|5b[) 
varies only weakly with n, pq < _D„ < 1/4, this depen- 
dence is tacitly ignored. When these two teams have 
a comparable number of wins, they have comparable 
chances to finish first. Hence, Eq. ^ yields the char- 
acteristic rank of the champion, function of the 
number of teams N and the number of games t 
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(7) 



Since we are primarily interested in the behavior as a 
function of t and N , the dependence on the probability p 
is henceforth left implicit. As expected, the champion be- 
comes stronger as the number of games increases (recall 
that small n represents a stronger team). By substitut- 
ing ^ \ into ([7]), we deduce that the total number of 
games, i*, needed for the best team to win is ~ N"^ . 

Since each of the N teams plays ~ N"^ games, the to- 
tal number of games required for the best team to emerge 
as the champion with high certainty grows as the cubic 
power of the number of teams, 



(8) 



This result has significant implications. In most sports 
leagues, two teams face each other a fixed number of 
times, usually once or twice. The corresponding total 
number of ~ iV^ games, is much smaller than ([8]). In this 
common league format, the typical rank of the champion 
scales as n* ~ y/N . Such a season is much too short as 
it enables weak teams to win championships. Indeed, it 
is not uncommon for the top two teams to trade places 
until the very end of the season or for two teams to tie 
for first, a clear indication that the season length is too 
short. 

We may also consider the probability distribution 
Qn{t) for the nth ranked team to win after t games. We 
expect that the scale n* characterizes the entire distribu- 
tion function, 



Qn ^ — -0 ( — 



(9) 



Assuming ip{0) is finite, the probability that the best 
team wins scale as follows, Qi ~ 1/n,. This quantity 
first grows, Qi{t) ^ Vt/N when t ^ N'^, and then, it 
saturates, Qi{t) « 1 when t ^ iV^. 

The likelihood of major upsets is quantified by the 
tail of the scaling function ip{z). Generally, the 
champion wins pt games (we neglect the diffusive 



3 



correction). The probability that the weakest team 
becomes champion by reaching that many wins is 
QN{t) ~ {li)q^^P'^^ - (g/p)(P-«)* where the asymptotic 
behavior follows from the Stirling formula t\ ^ tint — t. 
We conclude that the probability of the weakest team 
winning decays exponentially with the number of games, 
QN(t) ~ exp(- const x t). Yet, from ^ and 
QN^t) ^ and therefore, the tail of the proba- 

bility distribution is Gaussian 



exp I 



const X z 



(10) 



as 2: — > oo thereby implying that upset champions are 
extremely improbable. We note that single-elimination 
tournaments produce upset champions with a much 
higher probability because the corresponding distribu- 
tion function has an algebraic tail [ll| . We conclude that 
leagues have a much narrower range of outcomes and in 
this sense, leagues are more fair than tournaments. 



III. PRELIMINARY ROUNDS 

With such a large number of games, the ordinary 
league format is highly inefhcient. How can we devise 
a schedule that produces the best team as the champion 
with the least number of games? The answer involves 
preliminary rounds. In a preliminary round, teams play 
a small number of games and only the top teams advance 
to the next round. 

Let us consider a two stage format. The first stage 
is a preliminary round where teams play ti games and 
then, the teams are ranked according to the outcome of 
these games. The top M <^ N teams advance to the 
final round [l^, and the rest are eliminated. The final 
championship round proceeds via a league format with 
plenty of games to guarantee that the best team ends up 
at the top . 

We assume that the number of teams advancing to the 
second round grows sub-linearly 



(11) 



with ai < 1. Of course, we better not eliminate the 
best team. The number of games ti required for the 
top team to finish no worse than Mth place is obtained 
by substituting M into (O, ti N'^/NP. Since 

each of the N teams plays ti games, the total num- 
ber of games in the preliminary round is of the order 
Nti ~ N^/M^ - ]\f3-2ai^ Directly from jS]), the num- 
ber of games in the final round is ^ N^"'^ . Adding 
these two contributions, the total number of games, Ti, 
is 



(12) 



This quantity grows algebraically with the number of 
teams, Ti ~ N'''^ with 71 = max(3 — 2ai,3ai) and this 
exponent is minimal, 71 = 9/5, when 



Consequently, ti ^ N'^l^ . 

Thus, it is possible to significantly improve upon the 
ordinary league format using a two-stage procedure. The 
first stage is a preliminary round in which each of the N 
teams plays ti ~ N^^^ games and then the top M ^ TV^/s 
teams advance to the final round. The rest of the teams 
are eliminated. The first preliminary round requires N'^/^ 
games. In the final round the remaining teams play in 
a league with each of the possible (^^) pairs of teams 
playing each other M times. Again the number of games 
is N^/^ so that in total, 



(14) 



games are played. This is a substantial improvement over 
ordinary league play. 

Multiple preliminary rounds further reduce the number 
of games. Introducing an additional round, there are now 
three stages: the first preliminary round, the second pre- 
liminary round, and the championship round. Out of the 
first round A^"^ teams proceed to the second round and 
then, A^"i"2 teams proceed to the championship round. 
The total number of games T2 is a straightforward gen- 
eralization of 
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3-2^2 



(15) 



These three terms account respectively for the first 
round, the second round, and the final round. The first 
term is analogous to the first term in p2)) . and the last 
two terms are obtained by replacing A^ with A^"^ in (I12p . 
The total number of games is minimal when all three 
terms are of the same magnitude. Comparing the last 
two terms gives 3 — 2ai = 3ai and therefore, (fT3l) is 
recovered. Comparing the first two terms gives 



3 — 2a2 — a2(3 — 2ai). 



(16) 



Thus, a2 ~ 15/19 and since a2 > ai, the first elimination 
is less drastic then the second one. The total number of 
games, T2 ^ A^^^/^^, represents a further improvement. 

These results indicate that it is possible to systemat- 
ically reduce the total number of games via successive 
preliminary rounds that lead to the final championship 
round. In the most general case, there are k prelimi- 
nary rounds in addition to the final round. The number 
of teams advancing to the second round, Mk, grows as 
follows 



(17) 



From (|16p . the exponent ak obeys the recursion relation 
3 — 2ak+i = ak+i{3 — 2ak) or equivalently. 



5 -2a. 



(18) 



ai = 3/5. 



(13) 



By using cki = 3/5 we deduce the initial element in 
this series, aQ — 0. Introducing the transformation 
c^k = flfc/ctfe+1 reduces (jl8p to the Fibonacci-like recur- 
sion 3afc_|-2 — 5ak+i — 2ak- The general solution of this 
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195 
211 
32 
211 
243 
211 



633 
665 
64 
665 
729 
M5. 



TABLE I; The exponents at, Pk, and 7fc characterizing Mt, 
the number of teams advancing from the first round, tk, the 
number of games played by a team in the first round, and Tfc, 
the total number of games, as a function of the number of 
preliminary rounds k. 



equation is = Ar'l + B rl^ where ri — 1 and r2 = 2/3 
are the two roots of the quadratic equation 3r^ = 5r — 2. 
The coefficients follow from the zeroth element: oq — Q 
implies gq ~ and consequently, at — A[l~ (2/3)'^]. 
Therefore, 



at 



1 - (2/3)'' 
1- (2/3)''+^' 

HI)' 



(19) 



The exponent « 1 — (1)" (for k :s> 1) decreases 
exponentially to one (Table 1). This means that the 
number of teams advancing from the first to the second 
preliminary round is increasing with the total number of 
preliminary rounds played. Nonetheless, the fraction of 
teams that are eliminated 1 — N"''^^ converges to one as 
N ^ oo. Hence, nearly all of the teams are eliminated 
in large leagues. 

The number of games played by a team in the first 
round, tk, follows from ((17)) 



h^N^", /3fc = 2(l-7fc). 



(20) 



Since /^j. — > as fc — > c», only a small number of games is 
played in the opening round. Using Tk ^ Ntk, we arrive 
at our main result ^ where jk = 3 — 2afe. Surprisingly, 
the total number of games is roughly linear in the number 
of teams 



(21) 



when a larg e number of preliminary rounds is used, i.e., 
k —> oo [l9| . Clearly, this linear scaling is optimal since 
every team must play at least once. The asymptotic 
behavior 7^ « 1 + (|) implies that in practice, a 
small number of preliminary round suffices. For exam- 
ple, 74 = |i| = 1.15165 (Table I). 

We emphasize that in a /c-round format, the top N"'' 
teams proceed to the second round, out of which the 
top ]\f°'i'-iak teams proceed to the third round, and so 
on. The number of teams proceeding from the fcth round 
to the championship round is M ^ ]\[aia2---ak ^ From 
PT|) and T ~ M^, the size of the championship round 
approaches 



(22) 



as fc — !■ cx). This is the optimal size of a playoff that 
produces the best champion using the least number of 
games. 



IV. NUMERICAL SIMULATIONS 

Our scaling analysis is heuristic: we assumed that N is 
very large and we ignored numerical constants. To verify 
the applicability of our asymptotic results to moderately 
sized leagues, we performed numerical simulations with 
N teams that play an equal number of t games against 
randomly selected opponents. The outcome of each game 
is stochastic: with probability p the favorite wins and 
with probability q — 1—p, the underdog wins. We present 
simulation results for q = 1/4. 




t/N 



FIG. 1: The average rank of the champion, (n*), of a league 
with A*' teams after t games. The simulation results represent 
and average over 10'^ independent realizations with N = 10^, 
10^, and 10"'. A line of slope -1/2, predicted by Eq. (0, is 
plotted as a reference. 

The most important theoretical prediction is the rela- 
tion ([7|) between the rank of the winner, the number of 
games, and the size of the league. To test this prediction, 
we measured the average rank of the winner as a func- 
tion of the number of games t, for leagues of various sizes. 
In the simulations, it is convenient to shift the rank by 
one: the teams are ranked from n — (the best team) to 
n = — 1 (the worst team). With this definition, the av- 
erage rank decreases indefinitely with t. The simulations 
show that n^,/N^/^ ~ (t/A^)~^/^, thereby confirming the 
the theoretical prediction (figure 1). 

To validate ([8]), we simulated leagues with a large 
enough number of games, so that the best team wins 
with certainty. For every realization there is a number 
of games T after which the champion takes the lead for 
good. The average of this random variable, (T), mea- 
sured from the simulations, is in excellent agreement with 
the theoretical prediction (figure 2). 

The simulations also confirm that the scale char- 
acterizes the entire distribution as in Numerically, 
we find that the tail of the scaling function is super- 
exponential, exp(— z^) with /i > 1. The observed 
tail behavior is consistent with fi = 2, although the nu- 
merical evidence is not conclusive. 

To verify our prediction that multiple elimination 
rounds, following the format suggested above, reduce 
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FIG. 2: The average number of games (T) needed for the 
best team to emerge as the champion of a league with A'^ 
teams. The simulation results, representing an average over 
10^ independent realizations, are compared with the theoret- 
ical prediction 
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FIG. 3: The rank distribution of the league winner for ordi- 
nary league format (i = N). Shown is the scaled distribution 
VN Qn(t = A'') versus the scaling variable n/y/N . The simu- 
lation data were obtained using 10® independent Monte Carlo 
runs. 



V. IMPERFECT CHAMPIONS 

Let us relax the condition that the best team must 
win and implement a less rigorous championship round. 
Given a total of T '--^ M'^ games with 1 < c < 3, each 
team plays t ^ games. From ([7]), the typical rank 

of the winner scales as 



M- 



(23) 



Suppose that there are infinitely many preliminary 
rounds. The analysis in Section III reveals that the to- 
tal number of games scales linearly, T ^ M'^ ~ N , and 
consequently, M ~ N^^^. Therefore, there is a scaling 
relation between the rank of the winner and the number 
of teams n* ^ . Indeed, the value c = 3 produces 

the best champion. The common league format (c = 2) 
leads to ~ N^^"^, an improvement over the ordinary 
iV^/^ behavior. 

If there is one preliminary round, Eq. (I12|l becomes 
Ti ~ Ar3-2"i iV^ai and therefore, ai = 3/(2 -he). Gen- 
erally for k preliminary rounds, the exponent ak satis- 
fies the recursion relation ([18]), and the scaling relations 
7/c = 3 — 2ak and Pk = 2(1 — a/c) remain valid. We quote 
the value 



Ik 



1-^(1) 



(24) 



that characterizes the total number of games, T ~ . 
From T M" 7VT^ we conclude M ~ iVT'=/=. Substi- 
tuting this relation into ((23)) yields 



7fe(3 



2c 



(25) 



Using ordinary league play (c = 2) and one preliminary 
round, N^/"^ games are sufficient produce an imperfect 
champion of typical rank n* ^ N'^^^. Finally, we note 
that if each team plays a finite number of games (c = 1), 
all of the teams have a comparable chance of winning 
because = 7^ = 1. 



the number of games, we simulated a single elimination 
round (fc = 1). In the first stage, a total of N'^/^ games 
are played. All teams are then ranked according to the 
number of wins and the top M = N'^/^ teams proceed to 
the championship round. This final round has an ordi- 
nary league format with a total of games. We simu- 
lated three leagues of respective sizes TV = 10"'^, = 10^, 
and N = lO'^, and observed that the best team wins 
with a frequency of 70%. The champion is among the 
top three teams in 98% of the cases (these percentages 
are independent of N). As a reference, in an ordinary 
league with a total of games, the best team also wins 
with a likelihood of 70%. Remarkably, even for as little as 
A'^ = 10 teams, the one preliminary round format reduces 
the number of games by a factor > 10. We conclude that 
the scaling results are useful at moderate league size A^. 



VI. CONCLUSIONS 

In summary, we studied dynamics of league competi- 
tion with fixed team strength and a finite upset proba- 
bility. We demonstrated that ordinary league play where 
all teams play an equal number of games requires a very 
large number of games for the best team to win with 
certainty. We also showed that a series of preliminary 
rounds with a small but sufficient number games to suc- 
cessively eliminate the weakest teams is a fair and effi- 
cient way to identify the champion. We obtained scaling 
laws for the number of advancing teams and the num- 
ber of games in each preliminary round. Interestingly, it 
is possible to determine the best team by having teams 
play, on average, only a finite number of games (inde- 
pendent of league size). The optimal size of the final 
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championship round scales as the one-third power of the 
number of teams. 

Empirical validation of these results with real data may 
be possible using sports leagues, for example. The chal- 
lenge is that the inherent strength of each team is not 
known. In professional sports, a team's budget can serve 
as a proxy for its strength. With this definition, the aver- 
age rank of the American baseball world series champion, 
over the past 30 years, equals 6. There are however huge 
fluctuations: while the top team won 7 times, a team 
ranked as low as 26 (2003 Florida Marlins) also won. 

With wide ran ging applications, including for exam- 
ple evolution [20l. l2lf. leadership statistics is a challeng- 
ing extreme statistics problem because the record of one 



team constrains the records of all other teams. Our scal- 
ing approach, based on the record a fixed team, ignores 
such correlations. While these correlations do not affect 
the scaling laws, they do affect the distribution of out- 
comes such as the distribution of the rank of the winner, 
and the distribution of the number of games needed for 
the best team to take the lead for good. Other inter- 
esting questions include the expected number of distinct 
leaders, and the number of lead changes as a function of 
league size [H, [23| ■ 
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